Take Home Exercise 1

Overview

In this example, we use the following libraries:

  • sf

  • tidyverse

  • tmap

  • spdep

  • funModeling

  • shinyjs

The below code chunk below will load the following libraries. If the libraries are not installed, installation will begin and the libraries will be loaded after.

pacman::p_load(sf, spdep, tmap, tidyverse, funModeling, shinyjs)

Dataset

Data will be from 2 sources. One dataset will provide information on water points. This data is aspatial. The other provides information on Local Government Area boundary of Nigeria. This data is geospatial.

Aspatial data

We will be using the WPdx+ dataset from WPdx Global Data Repositories. Data downloaded is in point shapefile format.

Geospatial data

We will be using the Nigeria Level-2 Administrative Boundary polygon features GIS data. Data is downloaded from geoBoundaries and is in shapefile format.


Objective

  1. Load dataset with appropriate coordinate systems

  2. Determine proportion of functional and non-function water point at Local Goverment Area boundary level after join dataset together

    • Plot thematic map using proportion of function, nonfunctional and functional vs nonfunctional as a data classification
  3. Perform outlier/cluster analysis with various spatial association methods

    • Plot thematic map of clusters and outliers
  4. Perform hotspot analysis with spatial association method

    • Plot thematic map of hot and cold spots

1. Import data

We will save the water point, point shapefile into a simple feature object data table using st_read(). A sf data table is a data frame with columns as its attributes and rows as its features with a geometry column. Since we are looking at Nigeria water points only, we will filter the data using %>%.

Data is in WSG84 Geographic coordinate system. We will input crs = 4326 such that data is parsed correctly.

wp <- st_read(dsn = "geodata",
        layer = "geo_export",
        crs = 4326) %>%
  filter(clean_coun == "Nigeria") 
Reading layer `geo_export' from data source 
  `/Users/gladwinlam/R Quarto/gladwinlam/ISSS624/Takehome/Takehome1/geodata' 
  using driver `ESRI Shapefile'
Simple feature collection with 406566 features and 72 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -92.05073 ymin: -27.31495 xmax: 92.32694 ymax: 26.65622
Geodetic CRS:  WGS 84

There are 73 attributes with 95,008 rows.

Using the st_geometry() we can see the first 5 geometries of the features. They are in point format.

st_geometry(wp)
Geometry set for 95008 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Geodetic CRS:  WGS 84
First 5 geometries:

Using glimpse(), we can see the 73 attributes and its data format.

glimpse(wp)
Rows: 95,008
Columns: 73
$ row_id      <dbl> 429068, 222071, 160612, 160669, 160642, 160628, 160632, 64…
$ source      <chr> "GRID3", "Federal Ministry of Water Resources, Nigeria", "…
$ lat_deg     <dbl> 7.980000, 6.964532, 6.486940, 6.727570, 6.779900, 6.955560…
$ lon_deg     <dbl> 5.120000, 3.597668, 7.929720, 7.648670, 7.664850, 7.779170…
$ date_repor  <date> 2018-08-29, 2015-08-16, 2020-12-04, 2020-12-04, 2020-12-0…
$ time_repor  <chr> "00:00:00.000", "00:00:00.000", "00:00:00.000", "00:00:00.…
$ status_id   <chr> "Unknown", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
$ water_sour  <chr> NA, "Borehole", "Borehole", "Borehole", "Borehole", "Boreh…
$ water_so_2  <chr> NA, "Well", "Well", "Well", "Well", "Well", "Well", "Well"…
$ water_te_2  <chr> "Tapstand", "Mechanized Pump", "Hand Pump", NA, "Hand Pump…
$ X_water_tec <chr> "Tapstand", "Mechanized Pump", "Hand Pump", NA, "Hand Pump…
$ facility_t  <chr> "Improved", "Improved", "Improved", "Improved", "Improved"…
$ clean_coun  <chr> "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Ni…
$ clean_adm1  <chr> "Ekiti", "Ogun", "Ebonyi", "Enugu", "Enugu", "Benue", "Ben…
$ clean_adm2  <chr> "Moba", "Obafemi-Owode", "Ohaukwu", "Isi-Uzo", "Isi-Uzo", …
$ clean_adm3  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ clean_adm4  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ install_ye  <dbl> NA, NA, NA, NA, NA, NA, NA, 2016, 2016, 2015, 2016, NA, NA…
$ installer   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ rehab_year  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ rehabilita  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ manageme_2  <chr> NA, "Other", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ status_cle  <chr> NA, "Functional", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ pay         <chr> NA, "No", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ fecal_coli  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ fecal_co_2  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ subjective  <chr> NA, "Acceptable quality", NA, NA, NA, NA, NA, NA, NA, NA, …
$ activity_i  <chr> "1a117ba2-5256-4801-874c-f7611a4499dd", NA, NA, NA, NA, NA…
$ scheme_id   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ wpdx_id     <chr> "6FV7X4JC+222", "6FR5XH7X+R37", "6FR9FWPH+QVH", "6FR9PJHX+…
$ notes       <chr> "Tap Water", "Ajura", NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ orig_lnk    <chr> "https://nigeria.africageoportal.com/datasets/GRID3::grid3…
$ photo_lnk   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ country_id  <chr> "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG"…
$ data_lnk    <chr> "https://catalog.waterpointdata.org/datasets/grid3-nigeria…
$ distance_t  <dbl> 767.3742, 13364.9005, 9492.7619, 9319.0815, 5437.7141, 159…
$ distance_2  <dbl> 921.79187, 48.87743, 4333.34280, 23276.33227, 18783.56566,…
$ distance_3  <dbl> 3146.733237, 4167.519068, 693.211204, 307.716194, 134.6120…
$ distance_4  <dbl> 41049.944, 13898.649, 27381.922, 34823.605, 40785.885, 469…
$ distance_5  <dbl> 959.365, 9405.783, 72060.118, 31665.447, 31240.904, 44105.…
$ water_poin  <chr> "{\"2018-08-29\": {\"source\": \"GRID3\", \"status_id\": \…
$ rehab_prio  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 94, 628, 11742…
$ served_pop  <dbl> NA, 140, 0, 492, 868, 81, 0, 9969, 933, 1998, 4222, 94, 62…
$ local_popu  <dbl> NA, 887, 0, 492, 1919, 81, 0, 13740, 933, 11854, 22331, 26…
$ crucialnes  <dbl> NA, 0.1578354, NA, 1.0000000, 0.4523189, 1.0000000, NA, 0.…
$ pressure    <dbl> NA, 0.1400000, NA, 1.6400000, 2.8933333, 0.2700000, NA, 9.…
$ usage_cap   <dbl> 250, 1000, 300, 300, 300, 300, 300, 1000, 300, 300, 1000, …
$ is_urban    <chr> "True", "False", "False", "False", "False", "False", "Fals…
$ days_since  <dbl> 1483, 2592, 655, 655, 655, 655, 655, 2178, 2178, 2733, 211…
$ staleness_  <dbl> 62.65911, 44.17405, 81.34550, 81.34550, 81.34550, 81.34550…
$ is_latest   <chr> "T", "T", "T", "T", "T", "T", "T", "T", "T", "T", "T", "T"…
$ location_i  <dbl> 358773, 264633, 397972, 397984, 397982, 397976, 397979, 32…
$ cluster_si  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ clean_co_2  <chr> "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "N…
$ country_na  <chr> "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Ni…
$ water_so_3  <chr> "Tap", "Improved Tube well or borehole", "Borehole fitted …
$ water_tech  <chr> NA, "Motorised", NA, NA, NA, NA, NA, "Submersible", "India…
$ status      <chr> NA, "Functional (and in use)", NA, NA, NA, NA, NA, NA, NA,…
$ adm2        <chr> NA, "Obafemi-Owode", NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ adm3        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ management  <chr> NA, "Other", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ adm1        <chr> NA, "Ogun", "Enugu", "Enugu", "Enugu", "Enugu", "Enugu", N…
$ lat_deg_or  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ lat_lon_de  <chr> "(7.98?, 5.12?)", "(6.9645317?, 3.5976683?)", "(6.48694?, …
$ lon_deg_or  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ public_dat  <chr> "https://catalog.waterpointdata.org/datafiles/grid3-nigeri…
$ converted   <chr> NA, "#status_id, #water_source, #pay, #status, #management…
$ count       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ date_creat  <date> 2021-12-06, 2020-06-30, 2020-12-21, 2020-12-21, 2020-12-2…
$ time_creat  <chr> "21:12:57.000", "12:56:07.000", "19:26:15.000", "19:26:15.…
$ date_updat  <date> 2021-12-06, 2020-06-30, 2020-12-21, 2020-12-21, 2020-12-2…
$ time_updat  <chr> "21:12:57.000", "12:56:07.000", "19:26:15.000", "19:26:15.…
$ geometry    <POINT [°]> POINT (5.12 7.98), POINT (3.597668 6.964532), POINT …

As we can see from above, data is in a geographic coordinate system (GCS). For such system, units are angular and features are defined in 3D compared to a projected coordinate system (PCS) where units are linear and features are defined in 2D.

Image by (Smith, 2020) from here

A PCS is needed when distance is needed to be calculated and when you want to draw data on a flat map. As we will be doing distance based analysis, we will transform the GCS to PCS using the below code chunk:

wp_proj <- st_transform(wp, crs = 26391)
st_geometry(wp_proj)
Geometry set for 95008 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 32536.82 ymin: 33461.24 xmax: 1292096 ymax: 1091052
Projected CRS: Minna / Nigeria West Belt
First 5 geometries:

Next, we save the sf data table using write_rds() of readr package in rds data format. We will refer to wp_nga variable from here onward.

write_rds(wp_proj, "geodata/wp_nga.rds")

Next, we import the NGA boundary data of Nigeria into a simple feature data table.

nga <- st_read(dsn = "geodata",
               layer = "nga_admbnda_adm2_osgof_20190417",
               crs = 4326)
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source 
  `/Users/gladwinlam/R Quarto/gladwinlam/ISSS624/Takehome/Takehome1/geodata' 
  using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS:  WGS 84

Similarly, shape file is in a multipolygon shape file using a GCS of WGS84. To be consistent, we will also transform from the data from a GCS to a PCS.

nga_proj <- st_transform(nga, crs = 26391)
st_geometry(nga_proj)
Geometry set for 774 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 28879.72 ymin: 30292.37 xmax: 1343798 ymax: 1094244
Projected CRS: Minna / Nigeria West Belt
First 5 geometries:
glimpse(nga_proj)
Rows: 774
Columns: 17
$ Shape_Leng <dbl> 0.2370744, 0.2624772, 3.0753158, 2.5379842, 0.6871498, 1.06…
$ Shape_Area <dbl> 0.0015239210, 0.0035311037, 0.3268678399, 0.0683785064, 0.0…
$ ADM2_EN    <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ ADM2_PCODE <chr> "NG001001", "NG001002", "NG008001", "NG015001", "NG003001",…
$ ADM2_REF   <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ ADM2ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM2ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1_EN    <chr> "Abia", "Abia", "Borno", "Federal Capital Territory", "Akwa…
$ ADM1_PCODE <chr> "NG001", "NG001", "NG008", "NG015", "NG003", "NG011", "NG02…
$ ADM0_EN    <chr> "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nigeria", "Nig…
$ ADM0_PCODE <chr> "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG", "NG",…
$ date       <date> 2016-11-29, 2016-11-29, 2016-11-29, 2016-11-29, 2016-11-29…
$ validOn    <date> 2019-04-17, 2019-04-17, 2019-04-17, 2019-04-17, 2019-04-17…
$ validTo    <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ SD_EN      <chr> "Abia South", "Abia South", "Borno North", "Federal Capital…
$ SD_PCODE   <chr> "NG00103", "NG00103", "NG00802", "NG01501", "NG00302", "NG0…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((552560.3 12..., MULTIPOLYGON (…

From the summary table, there are many NA values in some of the columns.

We will save the sf data table into a rds file for the LGA data.

write_rds(nga_proj, "geodata/nga.rds")

We can take a lot at the base map of LGA Nigeria boundaries.

plot(st_geometry(nga_proj))

2. Task 1

Based on the above, we have used sf method to import and save it in a simple feature data frame .rds format and converting the GCS to PCS using EPSG 26391.


3. Data Wrangling

The “status_cle” of water point data frame contains the statuses of the water point on whether it is functional, non functional or others.

read_rds("geodata/wp_nga.rds") %>%
  pull(status_cle) %>%
  table(useNA = "ifany")
.
                       Abandoned         Abandoned/Decommissioned 
                             175                              234 
                      Functional      Functional but needs repair 
                           45883                             4579 
       Functional but not in use Non functional due to dry season 
                            1686                                7 
                  Non-Functional Non-Functional due to dry season 
                           29385                             2403 
                            <NA> 
                           10656 

3.1 Removing NAs

R interprets NA as blank when instead it actually means unknown. We can use mutate() function from dplyr package. mutate() is a window function which applies a desired operation to every row in the sf data frame. Here we will replace “NA” with “Unknown” and overwrite the wp_nga.rds file.

wp_nga <- read_rds("geodata/wp_nga.rds") %>%
  mutate(status_cle = replace_na(status_cle, "Unknown"))

We use fre() function from funModeling library to draw frequency bar charts of the various functions.

freq(data=wp_nga, 
     input = 'status_cle')

                        status_cle frequency percentage cumulative_perc
1                       Functional     45883      48.29           48.29
2                   Non-Functional     29385      30.93           79.22
3                          Unknown     10656      11.22           90.44
4      Functional but needs repair      4579       4.82           95.26
5 Non-Functional due to dry season      2403       2.53           97.79
6        Functional but not in use      1686       1.77           99.56
7         Abandoned/Decommissioned       234       0.25           99.81
8                        Abandoned       175       0.18           99.99
9 Non functional due to dry season         7       0.01          100.00

3.2 Categorizing Water Point Status

We can categorise the status into 3 groups - Functional, Non Function, Unknown

We break the category each into a dataframe on its own. We use the filter() function from dplyr.
Functional Non-Functional Unknown
Functional Non-Functional Unknown
Functional but needs repair Non Functional due to dry season
Functional but not in use Abandoned/Decommissioned
Abandoned
Non functional due to dry season
wpt_functional <- wp_nga %>%
  filter(status_cle %in%
           c("Functional", 
             "Functional but not in use",
             "Functional but needs repair"))

wpt_nonfunctional <- wp_nga %>%
  filter(status_cle %in%
           c("Abandoned/Decommissioned", 
             "Abandoned",
             "Non-Functional",
             "Non functional due to dry season",
             "Non-Functional due to dry season"))

wpt_unknown <- wp_nga %>%
  filter(status_cle == "Unknown")

3.3 Exploratory Data Analysis of Functional vs Non Functional Water Points

For functional water points, the breakdown is:

freq(data=wpt_functional, 
     input = 'status_cle')

                   status_cle frequency percentage cumulative_perc
1                  Functional     45883      87.99           87.99
2 Functional but needs repair      4579       8.78           96.77
3   Functional but not in use      1686       3.23          100.00

For non - functional water points, the breakdown is:

freq(data=wpt_nonfunctional, 
     input = 'status_cle')

                        status_cle frequency percentage cumulative_perc
1                   Non-Functional     29385      91.25           91.25
2 Non-Functional due to dry season      2403       7.46           98.71
3         Abandoned/Decommissioned       234       0.73           99.44
4                        Abandoned       175       0.54           99.98
5 Non functional due to dry season         7       0.02          100.00

4. Task 2 &3

Objective: Derive proportion of functional and non functional water point at LGA level and combing the 2 data tables into 1 data table.

To do so, we will need to compute the number of water points in each LGA boundary polygon and see how many of them fall in functional and non functional categories. We can perform a point in polygon method to do the counting.

First, we use st_intersects() which overlays the water points from wp_proj onto nga_proj. We then use lengths() function to count the number of water points within each polygon

nga <- read_rds("geodata/nga.rds")
nga
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 28879.72 ymin: 30292.37 xmax: 1343798 ymax: 1094244
Projected CRS: Minna / Nigeria West Belt
First 10 features:
   Shape_Leng  Shape_Area        ADM2_EN ADM2_PCODE       ADM2_REF ADM2ALT1EN
1   0.2370744 0.001523921      Aba North   NG001001      Aba North       <NA>
2   0.2624772 0.003531104      Aba South   NG001002      Aba South       <NA>
3   3.0753158 0.326867840         Abadam   NG008001         Abadam       <NA>
4   2.5379842 0.068378506          Abaji   NG015001          Abaji       <NA>
5   0.6871498 0.014528691           Abak   NG003001           Abak       <NA>
6   1.0624151 0.037864573      Abakaliki   NG011001      Abakaliki       <NA>
7   1.1871953 0.062915960 Abeokuta North   NG028001 Abeokuta North       <NA>
8   0.3159713 0.004609051 Abeokuta South   NG028002 Abeokuta South       <NA>
9   0.8775862 0.021879492            Abi   NG009001            Abi       <NA>
10  0.7120426 0.014377693    Aboh-Mbaise   NG017001    Aboh-Mbaise       <NA>
   ADM2ALT2EN                   ADM1_EN ADM1_PCODE ADM0_EN ADM0_PCODE
1        <NA>                      Abia      NG001 Nigeria         NG
2        <NA>                      Abia      NG001 Nigeria         NG
3        <NA>                     Borno      NG008 Nigeria         NG
4        <NA> Federal Capital Territory      NG015 Nigeria         NG
5        <NA>                 Akwa Ibom      NG003 Nigeria         NG
6        <NA>                    Ebonyi      NG011 Nigeria         NG
7        <NA>                      Ogun      NG028 Nigeria         NG
8        <NA>                      Ogun      NG028 Nigeria         NG
9        <NA>               Cross River      NG009 Nigeria         NG
10       <NA>                       Imo      NG017 Nigeria         NG
         date    validOn validTo                     SD_EN SD_PCODE
1  2016-11-29 2019-04-17    <NA>                Abia South  NG00103
2  2016-11-29 2019-04-17    <NA>                Abia South  NG00103
3  2016-11-29 2019-04-17    <NA>               Borno North  NG00802
4  2016-11-29 2019-04-17    <NA> Federal Capital Territory  NG01501
5  2016-11-29 2019-04-17    <NA>      Akwa Ibom North West  NG00302
6  2016-11-29 2019-04-17    <NA>              Ebonyi North  NG01103
7  2016-11-29 2019-04-17    <NA>              Ogun Central  NG02801
8  2016-11-29 2019-04-17    <NA>              Ogun Central  NG02801
9  2016-11-29 2019-04-17    <NA>       Cross River Central  NG00901
10 2016-11-29 2019-04-17    <NA>                  Imo East  NG01701
                         geometry
1  MULTIPOLYGON (((552560.3 12...
2  MULTIPOLYGON (((551048.7 12...
3  MULTIPOLYGON (((1245549 106...
4  MULTIPOLYGON (((510602.3 57...
5  MULTIPOLYGON (((598086 1218...
6  MULTIPOLYGON (((663801.7 25...
7  MULTIPOLYGON (((81132.88 37...
8  MULTIPOLYGON (((109265.1 34...
9  MULTIPOLYGON (((635478 2187...
10 MULTIPOLYGON (((543608.8 15...

We append 4 columns into nga sf data table - total water point, water point functional, water point non-functional and water point unknown and save it under a new sf data table called nga_wp.

nga_wp <- nga %>% 
  mutate(`total_wpt` = lengths(
    st_intersects(nga, wp_nga))) %>%
  mutate(`wpt_func` = lengths(
    st_intersects(nga, wpt_functional))) %>%
  mutate(`wpt_nonfunc` = lengths(
    st_intersects(nga, wpt_nonfunctional))) %>%
  mutate(`wpt_unknown` = lengths(
    st_intersects(nga, wpt_unknown)))

Next, we compute the percentage of the functional categories.

nga_wp <- nga_wp %>%
  mutate(`pct_func` = `wpt_func`/`total_wpt`)

Next, we compute the percentage of the non-functional categories.

nga_wp <- nga_wp %>%
  mutate(`pct_nonfunc` = `wpt_nonfunc`/`total_wpt`)

Next, we compute the percentage of the unknown categories.

nga_wp <- nga_wp %>%
  mutate(`pct_unknown` = `wpt_unknown`/`total_wpt`)
nga_wp %>%
  select(c(`pct_func`,`pct_nonfunc`,`pct_unknown`))
Simple feature collection with 774 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 28879.72 ymin: 30292.37 xmax: 1343798 ymax: 1094244
Projected CRS: Minna / Nigeria West Belt
First 10 features:
    pct_func pct_nonfunc pct_unknown                       geometry
1  0.4117647   0.5294118  0.05882353 MULTIPOLYGON (((552560.3 12...
2  0.4084507   0.4929577  0.09859155 MULTIPOLYGON (((551048.7 12...
3        NaN         NaN         NaN MULTIPOLYGON (((1245549 106...
4  0.4035088   0.5964912  0.00000000 MULTIPOLYGON (((510602.3 57...
5  0.4791667   0.5208333  0.00000000 MULTIPOLYGON (((598086 1218...
6  0.3519313   0.1802575  0.46781116 MULTIPOLYGON (((663801.7 25...
7  0.4705882   0.4411765  0.08823529 MULTIPOLYGON (((81132.88 37...
8  0.6050420   0.2773109  0.11764706 MULTIPOLYGON (((109265.1 34...
9  0.5197368   0.4078947  0.07236842 MULTIPOLYGON (((635478 2187...
10 0.2727273   0.3939394  0.33333333 MULTIPOLYGON (((543608.8 15...

As the focus is on analysing the non functional water points, we will zoom into the polygons with non functional water points.

summary(nga_wp$`pct_nonfunc`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.0000  0.2211  0.3559  0.3654  0.5082  1.0000      13 

From above, we can see the mean percentage of non functional water points in each LGA level is 36.6%. There are also 13 NAs. We will deal with it using the code chunk below:

nga_wp$pct_func <- ifelse(nga_wp$total_wpt == 0 & is.na(nga_wp$pct_func), 0, nga_wp$pct_func)

nga_wp$pct_nonfunc <- ifelse(nga_wp$total_wpt == 0 & is.na(nga_wp$pct_nonfunc), 0, nga_wp$pct_nonfunc)

nga_wp$pct_unknown <- ifelse(nga_wp$total_wpt == 0 & is.na(nga_wp$pct_unknown), 0, nga_wp$pct_unknown)
summary(nga_wp$`pct_nonfunc`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.2105  0.3505  0.3592  0.5076  1.0000 

After removing NAs and replacing it with 0, the new average percentage of non functional water point is 35.9%.

top_n(nga_wp, 5, `pct_nonfunc`)$SD_EN
[1] "Kaduna Central" "Bayelsa West"   "Adamawa North"  "Adamawa North" 
[5] "Kogi East"     

These are the top 5 areas with highest percentage of non functional water points.

ggplot(data=nga_wp, 
       aes(x= as.numeric(`pct_nonfunc`)))+
  geom_histogram(bins=30, 
                 color="black", 
                 fill="white") +
  labs(title = "Distribution of Non-Functioning Water Points in Nigeria",
      x = "Proportion of faulty water points (%)",
      y = "Frequency") +
  geom_density(alpha=.2, fill="#FF6666") + 
  geom_vline(aes(xintercept=mean(`pct_nonfunc`)),
            color="blue", linetype="dashed", size=1)

From the histogram above, we can see distribution is positively skewed. The areas with very high percentage of non functional water points are more rare.

4.1 Choropleth Mapping

In this section we will construct Thematic Mapping to show the spatial distribution of functional and non functional water point rate at LGA level using tmap library.

The thematic mapping palette can be chosen from a range of options. We can use the below code to find out the various options:

tmaptools::palette_explorer()

Firstly, the thematic mapping is based on the discrete data class interval classification of the percentage of functional and non functional water point. The way we classify the intervals will have an impact on the thematic representation of the map.

The classification options we have are: "cat""fixed""sd""equal""pretty""quantile""kmeans""hclust""bclust""fisher""jenks""dpih""headtails"

The jenk natural breaks style below classified into intervals by identifying groups of similar values and maximizes the differences.

tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "jenks",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Distribution of Non Functional water points by LGA level (Jenks classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

Next, we can trying building class intervals equally.

tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "equal",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Distribution of Non Functional water points by LGA level (Equal classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

We can also use quantile.

tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "quantile",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Distribution of Non Functional water points by LGA level (Quantile classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

We can also split using bagged clustering to cluster the percentages into intervals.

tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "bclust",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Distribution of Non Functional water points by LGA level (Bagged Clustering classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Are they different? To analyse we can use tmap_arrange() to put them together to compare side by side

nf_1 <-tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "jenks",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Non Functioal (Jenks classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

nf_2 <- tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "equal",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Non Functioal (Equal classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

nf_3 <- tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "quantile",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Non Functioal (Quantile classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

nf_4 <- tm_shape(nga_wp)+
  tm_fill("pct_nonfunc",
          style = "bclust",
          palette = "Purples",
          legend.hist = TRUE) +
  tm_layout(main.title = "Non Functional (Bagged Clustering\nclassification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")


tmap_arrange(nf_1, nf_2, nf_3, nf_4)
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

From the above, we can see the thematic of the map changes based on how we classify the data. There seems to be some clustering of similarly non functional percentage of water points in LGA level boundaries. However we should do a formal statistical test to see if there is spatial association between the LGA level boundaries instead of relying on visualisation to draw conclusions.

We will select bagged classification as the bagged classification is a popular clustering machine learning technique with proven good performance.

We do the same for functional water points.

a1 <-tm_shape(nga_wp)+
  tm_fill("pct_func",
          style = "jenks",
          palette = "YlGn",
          legend.hist = TRUE) +
  tm_layout(main.title = "Functional Points (Jenks classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

a2 <- tm_shape(nga_wp)+
  tm_fill("pct_func",
          style = "equal",
          palette = "YlGn",
          legend.hist = TRUE) +
  tm_layout(main.title = "Functional Points (Equal classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

a3<- tm_shape(nga_wp)+
  tm_fill("pct_func",
          style = "quantile",
          palette = "YlGn",
          legend.hist = TRUE) +
  tm_layout(main.title = "Functional Points (Quantile classification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")

a4 <- tm_shape(nga_wp)+
  tm_fill("pct_func",
          style = "bclust",
          palette = "YlGn",
          legend.hist = TRUE) +
  tm_layout(main.title = "Functional Points (Bagged Clustering\nclassification)",
            main.title.size = 1,
            legend.outside = TRUE)+
  tm_borders(lwd = 0.5,  alpha = 1, col = "black")


tmap_arrange(a1, a2, a3, a4)
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

We can put functional and non functional water point distribution maps side by side

tmap_arrange(a4, nf_4)
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

As expected, polygons where there are more functional water points have less non-functional water points and vice versa.

We can also create an interactive map by using tmap_mode(“view”). We can add bubbles using tm_bubbles() and add labels to include the cities. Below code chunk is used to overlay the percentage of non functional water points on the map. The size of the bubble is proportion to the percentage.

tmap_mode("view")
tm_shape(nga_wp) + 
  tm_bubbles(size = "pct_nonfunc", col = "red", border.lwd = 2, border.col = "black") +
  tm_tiles("Stamen.TonerLabels")

We create another version for functional water points.

tmap_mode("view")
tm_shape(nga_wp) + 
  tm_bubbles(size = "pct_func", col = "blue", border.lwd = 2, border.col = "white") +
  tm_tiles("Stamen.TonerLabels")

For both functional and non functional water points, we will select the data class interval using bagged cluster style.

Next, we reduce the overall sf data table size of nga_wp by selecting only the useful columns. We save the new table into a new .rds file

nga_wp_s <- nga_wp %>%
  select(3:4, 9:10, 18:23)

write_rds(nga_wp_s, "geodata/nga_wp_s.rds")
nga_wp_s <- read_rds("geodata/nga_wp_s.rds")
nga_wp_s
Simple feature collection with 774 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 28879.72 ymin: 30292.37 xmax: 1343798 ymax: 1094244
Projected CRS: Minna / Nigeria West Belt
First 10 features:
          ADM2_EN ADM2_PCODE ADM1_PCODE ADM0_EN total_wpt wpt_func wpt_nonfunc
1       Aba North   NG001001      NG001 Nigeria        17        7           9
2       Aba South   NG001002      NG001 Nigeria        71       29          35
3          Abadam   NG008001      NG008 Nigeria         0        0           0
4           Abaji   NG015001      NG015 Nigeria        57       23          34
5            Abak   NG003001      NG003 Nigeria        48       23          25
6       Abakaliki   NG011001      NG011 Nigeria       233       82          42
7  Abeokuta North   NG028001      NG028 Nigeria        34       16          15
8  Abeokuta South   NG028002      NG028 Nigeria       119       72          33
9             Abi   NG009001      NG009 Nigeria       152       79          62
10    Aboh-Mbaise   NG017001      NG017 Nigeria        66       18          26
   wpt_unknown  pct_func pct_nonfunc                       geometry
1            1 0.4117647   0.5294118 MULTIPOLYGON (((552560.3 12...
2            7 0.4084507   0.4929577 MULTIPOLYGON (((551048.7 12...
3            0 0.0000000   0.0000000 MULTIPOLYGON (((1245549 106...
4            0 0.4035088   0.5964912 MULTIPOLYGON (((510602.3 57...
5            0 0.4791667   0.5208333 MULTIPOLYGON (((598086 1218...
6          109 0.3519313   0.1802575 MULTIPOLYGON (((663801.7 25...
7            3 0.4705882   0.4411765 MULTIPOLYGON (((81132.88 37...
8           14 0.6050420   0.2773109 MULTIPOLYGON (((109265.1 34...
9           11 0.5197368   0.4078947 MULTIPOLYGON (((635478 2187...
10          22 0.2727273   0.3939394 MULTIPOLYGON (((543608.8 15...

5. Task 4

Objective: Perform a cluster and outlier analysis

Based on the previous visualisation, LGA boundaries where there are higher proportion of non-functional water points seem to appear close to other LGA boundaries with high non-functional water points. To validate this hypothesis, we can perform analysis of spatial association to see if these proportion variables indeed are connected spatially.

This association analysis is done in 2 steps:

  1. Check if there is spatial association using Global spatial autocorrelation measures

  2. If there is global autocorrelation, we can use the local spatial autocorrelation measures to check if the polygon is a cluster or outlier using the value of the test statistics

5.1 Choosing a Spatial Weighting Method

Spatial association analysis always involves neighbours of the polygon in focus. We assign weights to the neighbours surrounding the polygon being studied in a matrix by calculating and summing up their weighted average.

There are several methods assigning spatial weighting methods:

  • Contiguity based weighting method (based on shared boundaries)

  • Distanced based weighting method

  • Inverse distance weighting method

  • K nearest neighbour weighting method

As seen from base map, there is a wide variation in polygon size across Nigeria. Additionally wp_nga is a point data. Hence I will be choosing the distance based weighting method.

There are 2 options for distance based weighting methods:

  • Fixed distance weight matrix

  • Adaptive distance weight matrix

I will be using the adaptive distance weight matrix because the polygons of the LGA boundaries in nga_wp are in different shapes and sizes. If fixed distance is used, larger polygons may have less neighbours compared to smaller polygons with the same centroid to centroid distance. Also more dense areas have more neighbours than less dense areas.

The below code chunk identifies the longitude and latitude of each centroid using st_centroid function().

longitude <- map_dbl(nga_wp_s$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(nga_wp_s$geometry, ~st_centroid(.x)[[2]])

We combine them together using cbind() by joining them column wise.

coords <- cbind(longitude, latitude)
head(coords)
     longitude  latitude
[1,]  549364.0  123694.9
[2,]  547123.4  120376.5
[3,] 1189496.9 1059770.9
[4,]  489057.4  534262.6
[5,]  593718.2  113824.1
[6,]  642618.7  251222.3

I will use the knn2nb() function from spdep package to create a neighbour list of class nb. knearneigh() identifies the neighbours of each polygon and knn2nb() puts them in a list. In this scenario, we are using K nearest neighbour adaptive distance approach to identify the neighbours. We select 6 neighbours to start with.

set.seed(1)
knn6 <- knn2nb(knearneigh(coords, k=6))
knn6
Neighbour list object:
Number of regions: 774 
Number of nonzero links: 4644 
Percentage nonzero weights: 0.7751938 
Average number of links: 6 
Non-symmetric neighbours list

We take a glimpse of the list of neighbours with str().

str(knn6)
List of 774
 $ : int [1:6] 2 364 548 597 624 721
 $ : int [1:6] 1 548 597 624 721 725
 $ : int [1:6] 250 261 447 507 509 526
 $ : int [1:6] 20 263 446 454 466 690
 $ : int [1:6] 203 208 331 334 539 738
 $ : int [1:6] 170 217 218 337 379 553
 $ : int [1:6] 8 176 214 281 544 555
 $ : int [1:6] 7 214 281 306 544 555
 $ : int [1:6] 18 19 218 337 576 757
 $ : int [1:6] 25 216 325 528 552 632
 $ : int [1:6] 26 27 68 191 565 762
 $ : int [1:6] 135 263 417 446 690 695
 $ : int [1:6] 31 37 393 570 583 584
 $ : int [1:6] 170 363 546 577 581 589
 $ : int [1:6] 22 49 177 297 306 580
 $ : int [1:6] 30 187 296 328 357 360
 $ : int [1:6] 35 295 378 460 638 639
 $ : int [1:6] 9 19 218 574 576 601
 $ : int [1:6] 9 18 103 376 574 576
 $ : int [1:6] 4 106 239 419 454 466
 $ : int [1:6] 60 61 162 269 520 596
 $ : int [1:6] 49 297 326 443 515 623
 $ : int [1:6] 54 291 292 537 618 619
 $ : int [1:6] 123 476 527 652 673 761
 $ : int [1:6] 10 181 216 314 325 552
 $ : int [1:6] 11 27 191 336 562 762
 $ : int [1:6] 11 26 191 439 663 762
 $ : int [1:6] 29 178 299 300 358 369
 $ : int [1:6] 173 178 358 378 460 591
 $ : int [1:6] 16 39 41 186 192 360
 $ : int [1:6] 13 211 289 570 583 584
 $ : int [1:6] 51 62 461 462 515 693
 $ : int [1:6] 166 227 238 655 743 750
 $ : int [1:6] 42 104 136 213 559 757
 $ : int [1:6] 17 275 276 277 295 460
 $ : int [1:6] 107 247 408 455 681 759
 $ : int [1:6] 38 40 570 583 584 629
 $ : int [1:6] 39 40 41 186 320 570
 $ : int [1:6] 30 38 40 41 186 320
 $ : int [1:6] 37 38 39 41 186 570
 $ : int [1:6] 30 38 39 40 192 634
 $ : int [1:6] 86 136 137 499 613 718
 $ : int [1:6] 11 68 157 524 590 645
 $ : int [1:6] 45 192 303 328 360 634
 $ : int [1:6] 44 290 303 328 360 599
 $ : int [1:6] 387 429 438 521 668 742
 $ : int [1:6] 33 166 234 238 698 750
 $ : int [1:6] 65 113 265 386 482 701
 $ : int [1:6] 22 297 326 515 623 693
 $ : int [1:6] 36 98 107 409 432 681
 $ : int [1:6] 32 62 461 462 623 693
 $ : int [1:6] 78 165 293 532 602 636
 $ : int [1:6] 52 78 80 165 621 636
 $ : int [1:6] 23 79 293 294 532 536
 $ : int [1:6] 122 246 333 430 571 605
 $ : int [1:6] 77 376 533 576 601 728
 $ : int [1:6] 58 199 312 322 621 622
 $ : int [1:6] 57 322 323 603 621 622
 $ : int [1:6] 88 128 129 493 700 748
 $ : int [1:6] 61 563 578 592 596 626
 $ : int [1:6] 21 60 269 578 596 626
 $ : int [1:6] 32 51 461 462 515 693
 $ : int [1:6] 90 384 416 467 765 772
 $ : int [1:6] 48 65 74 113 131 407
 $ : int [1:6] 48 64 74 113 265 683
 $ : int [1:6] 103 104 331 338 351 574
 $ : int [1:6] 347 348 566 609 640 694
 $ : int [1:6] 43 157 191 549 590 645
 $ : int [1:6] 140 146 274 473 500 512
 $ : int [1:6] 71 299 341 343 344 610
 $ : int [1:6] 173 298 299 343 344 625
 $ : int [1:6] 566 567 568 609 638 639
 $ : int [1:6] 361 374 377 404 665 666
 $ : int [1:6] 65 109 265 683 741 754
 $ : int [1:6] 272 398 422 433 485 501
 $ : int [1:6] 254 427 470 547 647 677
 $ : int [1:6] 56 195 533 534 579 728
 $ : int [1:6] 52 79 165 215 532 636
 $ : int [1:6] 54 78 165 532 618 636
 $ : int [1:6] 52 53 78 165 215 739
 $ : int [1:6] 99 145 233 426 689 760
 $ : int [1:6] 15 49 51 177 352 580
 $ : int [1:6] 132 258 383 414 529 767
 $ : int [1:6] 24 148 437 482 673 692
 $ : int [1:6] 105 394 654 675 707 712
 $ : int [1:6] 42 136 137 499 613 718
 $ : int [1:6] 149 151 221 226 399 486
 $ : int [1:6] 59 150 489 648 700 714
 $ : int [1:6] 260 408 463 542 674 676
 $ : int [1:6] 63 163 236 237 384 710
 $ : int [1:6] 160 271 406 475 492 525
 $ : int [1:6] 119 390 391 392 487 656
 $ : int [1:6] 354 402 594 607 665 666
 $ : int [1:6] 31 158 436 561 596 709
 $ : int [1:6] 390 391 392 405 469 656
 $ : int [1:6] 97 139 389 420 451 653
 $ : int [1:6] 96 389 420 451 662 773
 $ : int [1:6] 50 117 153 231 432 696
 $ : int [1:6] 81 145 426 667 760 769
  [list output truncated]
 - attr(*, "region.id")= chr [1:774] "1" "2" "3" "4" ...
 - attr(*, "call")= language knearneigh(x = coords, k = 6)
 - attr(*, "sym")= logi FALSE
 - attr(*, "type")= chr "knn"
 - attr(*, "knn-k")= num 6
 - attr(*, "class")= chr "nb"

We can also visualise the neighbourhood linkages using a plot. See the code chunk below:

plot(nga_wp_s$geometry, border="lightgrey")
plot(knn6, coords, pch = 19, cex = 0.3, add = TRUE, col = "red")

The above knn2b() stores the neighbours into a list but it is not a spatial weight matrix. We can convert it into a spatial object object using nb2listw() function.

The style argument can take in “W”, “B”, “C”, “U”, “minmax” and “S”.

  • “B” - basic binary coding

  • “W” - row standardised

  • “C” - globally standardised

  • “U” - “C” but divided by total number of neighbours

  • “S” - Variance stabilising encoding

We will use the basic binary coding here for simplicity.

knn_lw <- nb2listw(knn6, style = 'B', zero.policy = TRUE)
summary(knn_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 774 
Number of nonzero links: 4644 
Percentage nonzero weights: 0.7751938 
Average number of links: 6 
Non-symmetric neighbours list
Link number distribution:

  6 
774 
774 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 with 6 links
774 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 with 6 links

Weights style: B 
Weights constants summary:
    n     nn   S0   S1     S2
B 774 599076 4644 8342 113854

5.2 Measuring Global Spatial Autocorrelation

We need to check if there is Global Spatial Autocorrelation. This can be performed using statistical testing to determine if there is spatial autocorrelation globally. The hypothesis are below:

H0: Observed spatial pattern of values is likely as any other spatial pattern. Values at one location do not depend on neighbouring location. There is spatial randomness and changing values of one location does not affect another.

H1: There is spatial dependencies. Changing values of one location affects another.

There are are 2 ways of measuring global spatial autocorrelation:

  • Global Moran’s I

  • Global Geary’s C

5.2.1 Using Global Moran’s I

The Moran I statistics can be calculated as above. The assumption to use the test is that the spatial data is normal and randomised.

Positive I (I>0) means a feature has neighbouring features that are similar and this is a cluster feature. Zero I (I=0) means a feature has neighbouring features that are randomly distributed. There is no association. Negative I (I<0) eans a feature has neighbouring features that are dissimilar and this is an dispursed feature.

We can use moran.test() of spdep package to run the test. The zero.policy argument adds in a list of 0 vectors for polygons who do not have neighbours.

Non Functional water points

moran.test(nga_wp_s$pct_nonfunc, 
           listw=knn_lw, 
           zero.policy = TRUE, 
           na.action=na.omit)

    Moran I test under randomisation

data:  nga_wp_s$pct_nonfunc  
weights: knn_lw    

Moran I statistic standard deviate = 24.73, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     0.4830241817     -0.0012936611      0.0003835295 

Based on the results above, P value < 0.05. We can reject the null hypothesis, that there is spatial autocorrelation between neighbours. Moran’s I statistics is also positive so it points towards clustered observations.

Since we are unsure whether the spatial data conforms to normality and randomisation, we can calculate the mean value of Moran’s I using a Monte Carlo simulation with 1000 rounds.

set.seed(1)

monte_moran_test <- moran.mc(nga_wp_s$pct_nonfunc, 
                listw=knn_lw, 
                nsim=999,
                zero.policy = TRUE, 
                na.action=na.omit)

monte_moran_test

    Monte-Carlo simulation of Moran I

data:  nga_wp_s$pct_nonfunc 
weights: knn_lw  
number of simulations + 1: 1000 

statistic = 0.48302, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
hist(nga_wp_s$pct_nonfunc, 
     freq=TRUE, 
     breaks=20, 
     xlab= "Monte Carlo Global Moran's I Sampling")
abline(v=0, 
       col="blue")

Comparing the Global Moran’s I statisitcs between monte carlo simulation and without, they are largely similar. The p values are both < 0.05 and hence we can confidently reject the null hypothesis that for proportion of non functional water point in Nigeria, there is spatial autocorrelation. This means the distribution of non functional water point in Nigeria is uneven and is higher than it would have been if randomly distributed.

Functional water points

We perform the similar steps for proportion of functional water point.

set.seed(1)

mc_mtest_f <- moran.mc(nga_wp_s$pct_func, 
                listw=knn_lw, 
                nsim=999,
                zero.policy = TRUE, 
                na.action=na.omit)

mc_mtest_f

    Monte-Carlo simulation of Moran I

data:  nga_wp_s$pct_func 
weights: knn_lw  
number of simulations + 1: 1000 

statistic = 0.5356, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Since p value < 0.05, we rejct null hypothesis. we can conclude that for proportion of functional water point in Nigeria, there is spatial autocorrelation. This means the distribution of functional water point in Nigeria is not even and is higher than it would have been if randomly distribution.

5.2.2 Using Global Geary’s C

Apart from Moran’s I, we can also use Global Geary’s C.

Large C (C>3) means a feature has neighbouring features that are dissimilar and this is a dispersed feature. C =1 means a feature and neighbouring features are randomly arranged. Small C (C<1) means a feature has neighbouring features that are similar and this is a cluster feature.

Non Functional water points

set.seed(1)
mc_gc_nf <- geary.mc(nga_wp_s$pct_nonfunc, 
               listw=knn_lw,
               zero.policy = TRUE,
               nsim=999)
mc_gc_nf

    Monte-Carlo simulation of Geary C

data:  nga_wp_s$pct_nonfunc 
weights: knn_lw 
number of simulations + 1: 1000 

statistic = 0.507, observed rank = 1, p-value = 0.001
alternative hypothesis: greater

Geary C test results align with Moran’s I with a small C statistics result that is statistically significant. We are confident there are spatial autocorrelation in the distribution of non functional water point across Nigeria.

Functional water points

set.seed(1)
mc_gc_f <- geary.mc(nga_wp_s$pct_func, 
               listw=knn_lw,
               zero.policy = TRUE,
               nsim=999)
mc_gc_f

    Monte-Carlo simulation of Geary C

data:  nga_wp_s$pct_func 
weights: knn_lw 
number of simulations + 1: 1000 

statistic = 0.4571, observed rank = 1, p-value = 0.001
alternative hypothesis: greater

Geary C test results align with Moran’s I with a small C statistics result that is statistically significant. We are confident there are spatial autocorrelation in the distribution of functional water points across Nigeria.

5.3 Spatial Correlogram

In this section we will identify how the non functional and functional water points proportion varies with increasing distance from each polygon for each polygon. We use sp.correlogram() from spdep to compute the spatially lagged values and use plot() to display the information on a plot.

Style = “B” means binary encoding. Method = “I” means using Moran’s I statistics.

Non Functional water points

GC_corr_nf <- sp.correlogram(knn6, 
                          nga_wp_s$pct_nonfunc, 
                          order=20, 
                          method="I", 
                          style="B",
                          zero.policy = TRUE)
plot(GC_corr_nf)

As we can see, there are autocorrelations with up to 10 lags. We need to consider more neighbours in our analysis for non functional water points.

Functional water points

GC_corr_f <- sp.correlogram(knn6, 
                          nga_wp_s$pct_func, 
                          order=20, 
                          method="I", 
                          style="B",
                          zero.policy = TRUE)
plot(GC_corr_f)

As we can see, there are autocorrelations with up to 12 lags. We need to consider more neighbours in our analysis for functional water points.

5.4 Cluster and Outlier Analysis

Global Moran I and Global Geary C statistics are only able to tell us if there are global spatial autocorrelation. We concluded that there is. However to identify which polygon has spatial autocorrelation and which does not we will need to turn towards local indicators of spatial association (LISA).

LISA can be computed using Local Moran I statistics with localmoran() function from spdep library. It will output a matrix with:

  • Ii: local Moran’s I statistics

  • E.Ii: expectation of local moran statistic

  • Var.Ii: variance of local moran statistic

  • Z.Ii: standard deviate of local moran statistic

  • Pr(): p-value of local moran statistic

Non Functional water points

localMI_nf <- localmoran(nga_wp_s$pct_nonfunc, knn_lw)
head(localMI_nf)
         Ii         E.Ii    Var.Ii      Z.Ii Pr(z != E(Ii))
1  2.340371 -0.005088610  3.909757 1.1861866   2.355486e-01
2  2.946740 -0.003141883  2.414802 1.8982955   5.765716e-02
3 17.532309 -0.022680865 17.375361 4.2114695   2.537148e-05
4  1.184000 -0.009891309  7.593749 0.4332483   6.648344e-01
5  5.580031 -0.004588478  3.525782 2.9741707   2.937816e-03
6  3.550849 -0.005630315  4.325577 1.7100084   8.726431e-02
colnames(cbind(nga_wp_s,localMI_nf))
 [1] "ADM2_EN"        "ADM2_PCODE"     "ADM1_PCODE"     "ADM0_EN"       
 [5] "total_wpt"      "wpt_func"       "wpt_nonfunc"    "wpt_unknown"   
 [9] "pct_func"       "pct_nonfunc"    "Ii"             "E.Ii"          
[13] "Var.Ii"         "Z.Ii"           "Pr.z....E.Ii.." "geometry"      

Next we append the local Moran I values into the nga_wp_s data table

nga_wp_s_lmi_nf <- cbind(nga_wp_s,localMI_nf) %>%
  rename("Prob(Li)"="Pr.z....E.Ii..")

5.4.1 Visualizing Moran I values

We can plot a choropleth mapping using tm_shape() and tm_fill() from tmap library. The thematic mapping will be based on the Prob(Li) values.

prob_nf <- tm_shape(nga_wp_s_lmi_nf) +
  tm_fill(col = "Prob(Li)", 
          breaks=c(-Inf, 0.01, 0.025, 0.05, Inf),
          palette="-Blues", 
          title = "Local Moran's I pvalues") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "Non Functional Water Points by\nLocal Moran P-value",
            main.title.size = 1,
            main.title.position = "centre")

prob_nf

LGA boundary areas where Local Moran I pvalues are greater than 0.05 are not significant. This means the proportion of non functional water points in these areas are not outliers or cluster.

li_nf <- tm_shape(nga_wp_s_lmi_nf) +
  tm_fill(col = "Ii", 
          style = "pretty", 
          title = "Local Moran I statisyics ") +
  tm_layout(main.title = "Non Functional Water Points by\nLocal Moran Statistics",
            main.title.size = 1,
            main.title.position = "centre",
            legend.outside = TRUE) +
  tm_borders(alpha = 0.5)

li_nf

If we plot a Local Moran I statistics, we can see largely most of the LGA boundaries is a cluster or outlier and there is spatial autocorrelation between the boundaries.

5.4.2 Identifying whether LGA Boundary is a cluster or outlier

When Local Moran I statistics is significant, we can conclude the area is either a cluster or outlier. The first step is to plot the Moran scatter plot. The scatter plot shows the relationship between a chosen attribute and its spatially lagged values at neighbouring location. It helps us identify locations of clusters and outliers.

Clusters:

  • HH cluster means centre spatial unit value is high and neighbouring values are also high

  • LL cluster means centre spatial unit value is low and neighbouring values are also low

Outliers:

  • LH outlier means centre spatial unit value is low but neighbouring values are high

  • HL outlier means centre spatial unit value is high but neighbouring values are low

First, we scale the percentage of non functional water points in each polygon. This allows a fairer comparison between the spatially lagged average non functional water point percentage and the non functional water point of a polygon.

We standardise the data using scale().

nonfunc_std <- scale(nga_wp_s$pct_nonfunc) %>% 
  as.vector 

We use moran.plot() to plot the moran scatter plot.

moran.plot(nonfunc_std, knn_lw,
           labels=as.character(nga_wp_s$ADM2_EN),
           xlab="z-Non functional water point", 
           ylab="Spatially Lag z-Non functional water point",
           pch=19,
           cex = 0.3)

Interpretating the scatter plot:

The Moran coefficient is equivalent to the slope of regression line on Moran plot. For each axis, the dotted lines represent the average of that attribute. There are also 4 quadrants created by the dotted lines in scatter plot. Each quadrant represents a cluster or outlier.

Top right hand quadrant represents the HH cluster and bottom left represents the LL cluster. A cluster means the centre spatial unit is positively autocorrelated with its neighbours. HH cluster means a polygon is high value and neighbours also high value. LL cluster means a polygon is low value and neighbours low value.

Top left hand quadrant represents the LH outlier and bottom right represents the HL outlier. Outlier means the centre spatial unit is negatively autocorrelated with its neighbours. LH outlier means the polygon is low in value but neighbours are high in value so it is a low outlier. HL outlier means the polygon in the centre is high in value but neighbours are low so it is a high outlier.

We can further analyse to see which LGA boundaries are low or high outliers or low or high clusters. We initialise a zero vector with same length as local MI matrix.

quadrant <- vector(mode="numeric",length=nrow(localMI_nf))

We centre the spatially lagged variable around the mean.

nga_wp_s_lmi_nf$lag_pct <- lag.listw(knn_lw, nga_wp_s$pct_nonfunc)
DV <- nga_wp_s_lmi_nf$lag_pct - mean(nga_wp_s_lmi_nf$lag_pct)

We also compute average of the local MI statistics to centre the values around the mean. Level of significance will be set at 0.05.

LM_I <- localMI_nf[,1] - mean(localMI_nf[,1])   
signif <- 0.05

If LM_I >0, it is positive. It is a cluster. If LM_I<0 it is negative. It is outlier

If DV> 0 means the spatially lagged variable is high. If DV<0 means spatially lagged variable is low.

We put all insignificant LGA boundaries into class 0 and the rest into class 1, 2, 3, 4.

quadrant[DV <0 & LM_I>0] <- 1 # LL
quadrant[DV >0 & LM_I<0] <- 2 # LH
quadrant[DV <0 & LM_I<0] <- 3 # HL
quadrant[DV >0 & LM_I>0] <- 4 # HH
quadrant[localMI_nf[,5] > signif] <- 0
table(quadrant)
quadrant
  0   1   2   3   4 
551  88  14  22  99 
nga_wp_s_lmi_nf$quadrant <- quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

nga_wp_s_lmi_nf <- rename(nga_wp_s_lmi_nf, "category"="quadrant")

tmap_mode("plot")

clus_nf <- tm_shape(nga_wp_s_lmi_nf) +
  tm_fill(col = "category",
          style = "cat",
          palette = colors[c(sort(unique(quadrant)))+1], 
          labels = clusters[c(sort(unique(quadrant)))+1])+
  tm_borders(alpha=0.5) +
  tm_layout(main.title = "Non Functional Water Points with\nSpatial Autocorrelation",
            main.title.size = 1,
            main.title.position = "centre")
clus_nf 

For non functional water points, we can see the there is a region of high value clusters located at South West of Nigeria while the low value cluster is diagnoally opposite and located at North East Nigeria.

For outliers, there are some LH outliers located around South West and off central while for HL outliers, they are scattered around the upper half of Nigeria, towards Northern and Eastern parts.

tmap_arrange(prob_nf,clus_nf)

tmap_arrange(li_nf,clus_nf)

5.4.3 Functional water points Analysis

Functional water points

In the following section, we will perform the same analysis as previously shown but for functional water points.

localMI_f <- localmoran(nga_wp_s$pct_func, knn_lw)

nga_wp_s_lmi_f <- cbind(nga_wp_s,localMI_f) %>%
  rename("Prob(Li)"="Pr.z....E.Ii..")
prob_f <- tm_shape(nga_wp_s_lmi_f) +
  tm_fill(col = "Prob(Li)", 
          breaks=c(-Inf, 0.01, 0.025, 0.05, Inf),
          palette="-Reds", 
          title = "Local Moran's I pvalues") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "Functional Water Points by\nLocal Moran P-value",
            main.title.size = 1,
            main.title.position = "centre")
prob_f

li_f <- tm_shape(nga_wp_s_lmi_f) +
  tm_fill(col = "Ii", 
          style = "pretty", 
          title = "Local Moran I statisyics ") +
  tm_layout(main.title = "Functional Water Points by\nLocal Moran Statistics",
            main.title.size = 1,
            main.title.position = "centre",
            legend.outside = TRUE) +
  tm_borders(alpha = 0.5)

li_f

class_f <- vector(mode="numeric",length=nrow(localMI_f))
nga_wp_s_lmi_f$lag_pct <- lag.listw(knn_lw, nga_wp_s$pct_func)
DV <- nga_wp_s_lmi_f$lag_pct - mean(nga_wp_s_lmi_f$lag_pct)
LM_I <- localMI_f[,1] - mean(localMI_f[,1])   
signif <- 0.05
class_f[DV <0 & LM_I>0] <- 1 # LL
class_f[DV >0 & LM_I<0] <- 2 # LH
class_f[DV <0 & LM_I<0] <- 3 # HL
class_f[DV >0 & LM_I>0] <- 4 # HH
class_f[localMI_f[,5] > signif] <- 0
nga_wp_s_lmi_f$class <- class_f
colors <-  c("#ffffff", "#CCFFFF", "#CCCCFF", "#FFFF99", "#FF6666")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")

clus_f <- tm_shape(nga_wp_s_lmi_f) +
  tm_fill(col = "class",
          style = "cat",
          palette = colors[c(sort(unique(class_f)))+1], 
          labels = clusters[c(sort(unique(class_f)))+1])+
  tm_borders(alpha=0.5) +
  tm_layout(main.title = "Functional Water Points with\nSpatial Autocorrelation",
            main.title.size = 1,
            main.title.position = "centre")
clus_f 

tmap_arrange(clus_nf, clus_f)

From both graphs above, we can conclude statisitcally and significantly that there are more functional water points in the Northern/ upper half of Nigeria than southern Nigeria/ lower half.


6. Task 5

Objective: Performing hotspot and cold spot analysis

6.1 Hotspot Analysis

Apart from cluster and outlier detection, we can use LISA to determine hotspots and coldspots. We will use a different statistics known as local Getis and Ord’s G-statistics.

H0: Spatial distribution of feature attribute is random spatial process

H1: Spatial distribution of feature attribute is not a random spatial process

G -statistics >0 means there is association with relatively high values of the surrounding locations.

G -statistics< 0 means there is association with relatively low values of the surrounding locations.

It looks at neighbours within a defined proximity to identify where either high or low value areas spatially. Here, statistically significant hot-spots are recognised as areas of high values where other areas neighbouring are high in values too. Coldspots are areas of low values where other areas neighbouring are low in values.

Hotspot: high values cluster , Coldspot: low values cluster

Non Functional Water Point

gi <- localG(nga_wp_s$pct_nonfunc, knn_lw)
npa_wp_s_nf_gi <- cbind(nga_wp_s, as.matrix(gi)) %>%
  rename("gstat_adaptive" = "as.matrix.gi.")
head(gi)
[1]  1.1861866  1.8982955 -4.2114695  0.4332483  2.9741707 -1.7100084
hotspot_nf <- tm_shape(npa_wp_s_nf_gi) + 
  tm_fill(col = "gstat_adaptive", 
          style = "pretty", 
          palette="-RdBu", 
          title = "local Gi") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "Non Functional Water Points Hotspots",
            main.title.size = 1,
            main.title.position = "centre")
hotspot_nf

Based on the above, for non functional water points proportion, there is higher proportion of non functional water points in South and South West Nigeria (hot spot). There is lower proportion of non functional water points in North and North East Nigeria. This conclusion aligns with the cluster and outlier analysis. Additionally, in the west, there seems to be cold spot which means there is lower percentage of non functional water point.

Functional Water Point

gi_f <- localG(nga_wp_s$pct_func, knn_lw)
npa_wp_s_f_gi <- cbind(nga_wp_s, as.matrix(gi_f)) %>%
  rename("gstat_adaptive" = "as.matrix.gi_f.")
hotspot_f <- tm_shape(npa_wp_s_f_gi) + 
  tm_fill(col = "gstat_adaptive", 
          style = "pretty", 
          palette="-RdBu", 
          title = "local Gi") +
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "Functional Water Points Hotspots",
            main.title.size = 1,
            main.title.position = "centre")
hotspot_f

Red areas are hot spots with higher proportion of functioning water point.


7. Data Interpretation

7.1 Non Functional Water point

We can put count analysis, cluster analysis and hotspot analysis side by side using tmap_arrange() function.

tmap_arrange(clus_nf,hotspot_nf, nf_4)
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Overall, the 3 graphs are largely aligned with one another. From the above we can see that we should not rely only on 1 graph for our data analytics interpretation.

If we have looked only at the percentage analysis by LGA level (bottom left graph), we would have placed more urgency for repairs in LGA areas where there are higher percentage of non functional water points. This will mean repairing the water points in North West Nigeria regions and LGA areas dispersed across Nigeria where repair efforts will be divided.

However when we look at the spatial autocorrelation analysis (top left), we realise there is high portion of non functioning water point cluster in the South West Nigeria. Repair efforts should be focused in this area first. This was not being highlighted in the graph of percentage analysis by LGA level.

The hotspot analysis has also shown us within a high high cluster, not all LGA areas in the high high cluster have the same severity of non functional water points. For example within the high high cluster identified by the cluster graph, there is a point with hotspot higher than the other surrounding hotspot as shown by the hotspot graph.

The region of higher spatial autorcorrelation and hotspot should be repaired first. This is because a hotspot and autocorrelation mean that the specific LGA area and its neighbouring LGA area have high non functional water points. These areas are areas with lower access to water and should be prioritized to provide a steady water supply to its people.

We take our analysis further be finding out the LGA areas where there are highest percentage of non functional water points,is a high high cluster and a hotpot with the below code chunk.

npa_wp_s_nf_gi_cpy <- npa_wp_s_nf_gi
npa_nf <- cbind(npa_wp_s_nf_gi_cpy,nga_wp_s_lmi_nf$category)

npa_nf %>%
  filter(nga_wp_s_lmi_nf.category == 4) %>%
  arrange(.,desc(gstat_adaptive), desc(pct_nonfunc)) %>%
  select(ADM2_EN) %>%
  head(5)
Simple feature collection with 5 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 311733 ymin: 73672.86 xmax: 625884.3 ymax: 421507.8
Projected CRS: Minna / Nigeria West Belt
    ADM2_EN                       geometry
1 Etim Ekpo MULTIPOLYGON (((583649.6 11...
2 Oruk Anam MULTIPOLYGON (((596319.7 98...
3       Apa MULTIPOLYGON (((624282.4 41...
4  Ukanafun MULTIPOLYGON (((581423 1045...
5    Burutu MULTIPOLYGON (((364659.6 15...

Etim Ekpo, Oruk Anam, Apa, Ukanafun and Burutu should be areas where the water humanitarian aid should focus first as these are the areas with lowest access to water.

7.2 Functional Water point

tmap_arrange(clus_f,hotspot_f)

tmap_arrange(a4)
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

From the 3 graphs above, we can see the northern region of Nigeria has a high proportion of functioning water point. Research study can be done on it to identify what are the reasons why these areas have statically significantly higher functioning water points than other areas of Nigeria. The learning can then be applied to other areas of Nigeria in managing water points.

We can also see the tip of the North West Region in fact does not have a high percentage of functioning water points too. Since it does not have high non functioning, and high functioning water points, data here seems to be not clean or missing and more should be looked into

References

https://geocompr.github.io/post/2019/tmap-color-scales/

https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html